Adding GOMEMLIMIT to compactor libsonnet by ie-pham · Pull Request #1758 · grafana/tempo

ie-pham · 2022-09-22T18:52:49Z

What this PR does: Adding GOMEMLIMIT to compactor libsonnet

Which issue(s) this PR fixes:
Fixes #

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

joe-elliott · 2022-09-23T16:44:48Z

    $.util.readinessProbe +
-    (if $._config.variables_expansion then container.withArgsMixin(['--config.expand-env=true']) else {}),
+    (if $._config.variables_expansion then container.withArgsMixin(['--config.expand-env=true']) else {}) +
+    container.withEnvMixin([envVar.new('GOMEMLIMIT', $._config.compactor.resources.limits.memory + 'B')]),


if someone were to set _config.compactor.resources.limits.memory to null i think this would break. i think an if statement like the above should be able to check before adding the env var.

andreasgerstmayr · 2022-12-01T17:33:26Z

I'm curious, what is your experience with GOMEMLIMIT vs. memory ballast? Could we add GOMEMLIMIT to all components, and remove the ballast (with one caveat, that GOMEMLIMIT is only available since Go 1.19)?

joe-elliott · 2022-12-01T18:59:12Z

I haven't had a chance to reevaluate the impact of ballast in a modern go application. I do think that GOMEMLIMIT and ballast are designed to serve different purposes.

Ballast is designed to signal to the GC that you are expecting to create an application that consumes a large amount of memory. It prevents the GC from overworking in the startup phase of your application.

GOMEMLIMIT is meant to be a hard limit at which you are ok with expending a large amount of CPU (to the detriment of the application's main focus) to GC. This works quite well for compactors. We'd much rather slow down a compactor then have it OOM and throw away the last few minutes of work.

For other components it's not so obvious what we'd prefer. If a querier takes on too much work is it better to OOM so the queries get retried? or just answer them far slower. We've experimented with GOMEMLIMIT on ingesters and, as the allocated memory GOMEMLIMIT, the ingester starts failing a larger and larger percentage of its requests.

If you do some experimentation with these parameters please report back! The goal is to use these tools to create a Tempo that is easy to operate and fulfills the operational expectations of its users.

andreasgerstmayr · 2022-12-02T11:59:44Z

Thanks for your detailed answer @joe-elliott!

Unfortunately I didn't have a chance to evaluate the parameters myself yet. From reading the blogs at https://weaviate.io/blog/2022/08/GOMEMLIMIT-a-Game-Changer-for-High-Memory-Applications.html and https://www.sobyte.net/post/2022-06/memory-ballast-gc-tuner/ I got the impression that GOMEMLIMIT is making the ballast obsolete, but I think you're right, there are two different scenarios in play: OOM (even if memory is available) and GC overworking.

There's also GOGC, which we can tune. https://go.dev/doc/gc-guide#Memory_limit has interesting (interactive) charts how GOGC and GOMEMLIMIT play together. In the end, I think these knobs need to be evaluated against actual workloads, as you did. Thanks for sharing your experience, I'll definitely report back when I get a chance to do experiments with them.

andreasgerstmayr · 2023-02-15T14:47:38Z

I did some experimentation with ballast and GOMEMLIMIT. I created (except for the parameters under test) identical deployments of Tempo v1.5 in the microservices mode, and used ghcr.io/grafana/xk6-client-tracing:v0.0.2 to generate load.

When using 3 load generator replicas with 1000 configured virtual users, each sending traces every 1-5 seconds, I observed on average 3.15 GC collections/minute for the distributor and 6.56 collections/minute for the ingester. With 1 GB ballast the numbers went down to 1.95 for the distributor and 2.85 for the ingester.

In relative terms it's an improvement of 38% and 56% respectively, but in absolute numbers at most 4 less GC operations per minute are negligible I'd say.

A test with GOMEMLIMIT=2GiB showed excessive CPU usage in the ingester, as the Go runtime couldn't keep the memory under 2GiB and started running GC collections excessively (at the expense of CPU).

I'm not 100% sure how to interpret the results. Maybe the load was too less (too less spans), or I focused on the wrong metric (I didn't look at the GC duration, as the GC should happend in the background per prometheus/client_golang#618 (comment)), or I made some error or wrong conclusion in the test. Maybe the results are also fine and its only relevant in huge deployments :)

For reference, here's the entire test setup including the manifests and Grafana dashboard: https://github.com/andreasgerstmayr/tempo-perf-eval/tree/main/ballast

joe-elliott · 2023-02-22T12:53:40Z

This analysis is excellent and makes sense to me. When setting GOMEMLIMIT=2GiB the ingester/distributor's CPU should spike dramatically since they are working so hard to keep the memory under the 2GIB value.

What about these results did you find surprising?

andreasgerstmayr · 2023-02-22T13:38:58Z

What about these results did you find surprising?

I thought I'll see a much higher improvement in terms of absolute numbers - going from 3.15 GC collections per minute for the distributor and 6.56 collections per minute for the ingester down to 1.95 for the distributor and 2.85 for the ingester is in absolute numbers not a big difference, I'd say it's negligible in this case.

I guess the synthetic load wasn't enough, or tempo's memory handling is more efficient compared to the application from twitch (https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i-learnt-to-stop-worrying-and-love-the-heap/), which went from about 256 ops/min to about 4 ops/min. Probably the Go GC efficiency also improved over the years.

ie-pham added 4 commits September 22, 2022 13:51

adding GOMEMLIMIT to compactor libsonnet

bd09971

missing import

cdd4263

adding yamls

cf335c3

changelog

4c0451b

ie-pham marked this pull request as ready for review September 23, 2022 16:23

ie-pham requested review from annanay25, joe-elliott, mapno, mdisibio, yvrhdn and zalegrala as code owners September 23, 2022 16:23

joe-elliott reviewed Sep 23, 2022

View reviewed changes

add conditional statement

1f48288

joe-elliott approved these changes Sep 23, 2022

View reviewed changes

ie-pham self-assigned this Sep 23, 2022

joe-elliott merged commit 98364b2 into grafana:main Sep 23, 2022

andreasgerstmayr mentioned this pull request Dec 1, 2022

Add ballast options across components grafana/tempo-operator#74

Closed

ie-pham deleted the jennie/compactor-jsonnet branch March 17, 2023 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding GOMEMLIMIT to compactor libsonnet#1758

Adding GOMEMLIMIT to compactor libsonnet#1758
joe-elliott merged 5 commits intografana:mainfrom
ie-pham:jennie/compactor-jsonnet

ie-pham commented Sep 22, 2022 •

edited

Loading

Uh oh!

joe-elliott Sep 23, 2022

Uh oh!

andreasgerstmayr commented Dec 1, 2022

Uh oh!

joe-elliott commented Dec 1, 2022

Uh oh!

andreasgerstmayr commented Dec 2, 2022

Uh oh!

andreasgerstmayr commented Feb 15, 2023

Uh oh!

joe-elliott commented Feb 22, 2023

Uh oh!

andreasgerstmayr commented Feb 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ie-pham commented Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joe-elliott Sep 23, 2022

Choose a reason for hiding this comment

Uh oh!

andreasgerstmayr commented Dec 1, 2022

Uh oh!

joe-elliott commented Dec 1, 2022

Uh oh!

andreasgerstmayr commented Dec 2, 2022

Uh oh!

andreasgerstmayr commented Feb 15, 2023

Uh oh!

joe-elliott commented Feb 22, 2023

Uh oh!

andreasgerstmayr commented Feb 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ie-pham commented Sep 22, 2022 •

edited

Loading